Towns County
An HTR-LLM Workflow for High-Accuracy Transcription and Analysis of Abbreviated Latin Court Hand
This article presents and validates an ideal, four-stage workflow for the high-accuracy transcription and analysis of challenging medieval legal documents. The process begins with a specialized Handwritten Text Recognition (HTR) model, itself created using a novel "Clean Ground Truth" curation method where a Large Language Model (LLM) refines the training data. This HTR model provides a robust baseline transcription (Stage 1). In Stage 2, this baseline is fed, along with the original document image, to an LLM for multimodal post-correction, grounding the LLM's analysis and improving accuracy. The corrected, abbreviated text is then expanded into full, scholarly Latin using a prompt-guided LLM (Stage 3). A final LLM pass performs Named-Entity Correction (NEC), regularizing proper nouns and generating plausible alternatives for ambiguous readings (Stage 4). We validate this workflow through detailed case studies, achieving Word Error Rates (WER) in the range of 2-7% against scholarly ground truths. The results demonstrate that this hybrid, multi-stage approach effectively automates the most laborious aspects of transcription while producing a high-quality, analyzable output, representing a powerful and practical solution for the current technological landscape.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Georgia > Towns County (0.04)
- Europe > United Kingdom > England > Hertfordshire (0.04)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Feng, Xueyang, Lan, Bo, Dai, Quanyu, Wang, Lei, Tang, Jiakai, Chen, Xu, Dong, Zhenhua, Wen, Ji-Rong
In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. Specifically, we develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning, and design an off-policy joint policy gradient optimization algorithm with imitation learning regularization to enhance the data efficiency and training stability in agent tasks. RetroAct significantly improves the performance of open-source models, reduces dependency on closed-source LLMs, and enables fine-tuned agents to learn and evolve continuously. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
- Europe > United Kingdom > England (0.04)
- Europe > Norway (0.04)
- North America > United States > Arkansas (0.04)
- (5 more...)
- Information Technology > Security & Privacy (0.67)
- Education > Educational Setting (0.47)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Spatio-Temporal Conformal Prediction for Power Outage Data
Jiang, Hanyang, Xie, Yao, Qiu, Feng
With the global climate change, extreme weather events like hurricanes, winter storms, and tornadoes have increasingly led to widespread electric power outages across the United States [14]. For instance, during March 2018, the northeastern U.S. was battered by three consecutive winter storms within a span of 14 days. This series of events caused power outages that left over 2.75 million customers without electricity in the New England region, resulting in economic losses of approximately $4 billion, including $2.9 billion in insured damages [8]. Such severe weather-related incidents often leave millions without power for extended periods, resulting in significant economic disruption [19] and, tragically, sometimes even loss of life [25]. Given the considerable impact of extreme weather on power systems since the early 2000s, regulatory bodies in the U.S. have called on the energy sector to enhance the resilience of power grids through various hardening measures [1]. Consequently, accurately assessing the resilience of power grids is crucial not only for estimating potential damage from extreme weather but also for informing short-term disaster response strategies, long-term resilience planning, and shaping energy policy.
- North America > United States > Massachusetts (0.05)
- North America > United States > South Carolina (0.05)
- North America > United States > North Carolina (0.04)
- (3 more...)
DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling
Perez, Gustavo, Maji, Subhransu, Sheldon, Daniel
Many modern applications use computer vision to detect and count objects in massive image collections. However, when the detection task is very difficult or in the presence of domain shifts, the counts may be inaccurate even with significant investments in training data and model development. We propose DISCount -- a detector-based importance sampling framework for counting in large image collections that integrates an imperfect detector with human-in-the-loop screening to produce unbiased estimates of counts. We propose techniques for solving counting problems over multiple spatial or temporal regions using a small number of screened samples and estimate confidence intervals. This enables end-users to stop screening when estimates are sufficiently accurate, which is often the goal in a scientific study. On the technical side we develop variance reduction techniques based on control variates and prove the (conditional) unbiasedness of the estimators. DISCount leads to a 9-12x reduction in the labeling costs over naive screening for tasks we consider, such as counting birds in radar imagery or estimating damaged buildings in satellite imagery, and also surpasses alternative covariate-based screening approaches in efficiency.
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > Georgia > Towns County (0.04)
- (2 more...)
HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision
Zhao, Wenting, Chiu, Justin T., Cardie, Claire, Rush, Alexander M.
Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers. This problem has been extensively studied under the supervised setting, where both answer and rationale annotations are given. Because rationale annotations are expensive to collect and not always available, recent efforts have been devoted to developing methods that do not rely on supervision for rationales. However, such methods have limited capacities in modeling interactions between sentences, let alone reasoning across multiple documents. This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision. Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document. Experimental results show that our approach is more accurate at selecting rationales than the previous methods, while maintaining similar accuracy in predicting answers.
- North America > United States > Montana (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (13 more...)
- Leisure & Entertainment (1.00)
- Transportation > Infrastructure & Services > Airport (0.46)
- Transportation > Air (0.46)
- Media > Film (0.46)
The Role of Pre-training Data in Transfer Learning
Entezari, Rahim, Wortsman, Mitchell, Saukh, Olga, Shariatnia, M. Moein, Sedghi, Hanie, Schmidt, Ludwig
The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy
- Europe > Portugal > Lisbon > Lisbon (0.14)
- Europe > Austria > Styria > Graz (0.04)
- North America > United States > Oregon (0.04)
- (19 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
Where did you tweet from? Inferring the origin locations of tweets based on contextual information
Lamsal, Rabindra, Harwood, Aaron, Read, Maria Rodriguez
Public conversations on Twitter comprise many pertinent topics including disasters, protests, politics, propaganda, sports, climate change, epidemics/pandemic outbreaks, etc., that can have both regional and global aspects. Spatial discourse analysis rely on geographical data. However, today less than 1% of tweets are geotagged; in both cases--point location or bounding place information. A major issue with tweets is that Twitter users can be at location A and exchange conversations specific to location B, which we call the Location A/B problem. The problem is considered solved if location entities can be classified as either origin locations (Location As) or non-origin locations (Location Bs). In this work, we propose a simple yet effective framework--the True Origin Model--to address the problem that uses machine-level natural language understanding to identify tweets that conceivably contain their origin location information. The model achieves promising accuracy at country (80%), state (67%), city (58%), county (56%) and district (64%) levels with support from a Location Extraction Model as basic as the CoNLL-2003-based RoBERTa. We employ a tweet contexualizer (locBERT) which is one of the core components of the proposed model, to investigate multiple tweets' distributions for understanding Twitter users' tweeting behavior in terms of mentioning origin and non-origin locations. We also highlight a major concern with the currently regarded gold standard test set (ground truth) methodology, introduce a new data set, and identify further research avenues for advancing the area.
- Research Report (1.00)
- Personal > Interview (0.40)
- Information Technology > Services (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)
Multi-Type Conversational Question-Answer Generation with Closed-ended and Unanswerable Questions
Hwang, Seonjeong, Kim, Yunsu, Lee, Gary Geunbae
Conversational question answering (CQA) facilitates an incremental and interactive understanding of a given context, but building a CQA system is difficult for many domains due to the problem of data scarcity. In this paper, we introduce a novel method to synthesize data for CQA with various question types, including open-ended, closed-ended, and unanswerable questions. We design a different generation flow for each question type and effectively combine them in a single, shared framework. Moreover, we devise a hierarchical answerability classification (hierarchical AC) module that improves quality of the synthetic data while acquiring unanswerable questions. Manual inspections show that synthetic data generated with our framework have characteristics very similar to those of human-generated conversations. Across four domains, CQA systems trained on our synthetic data indeed show good performance close to the systems trained on human-annotated data.
- Asia > Middle East > Saudi Arabia (0.14)
- North America > United States > Pennsylvania (0.05)
- North America > United States > New York (0.05)
- (7 more...)
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
Li, Zekun, Guan, Runyu, Yu, Qianmu, Chiang, Yao-Yi, Knoblock, Craig A.
Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > United Kingdom > Scotland (0.04)
- North America > United States > New York (0.04)
- (28 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)